Multi-resolution auditory scene analysis: robust speech recognition using pattern-matching from a noisy signal

نویسندگان

Sue Harding

Georg Meyer

چکیده

Unlike automatic speech recognition systems, humans can understand speech when other competing sounds are present Although the theory of auditory scene analysis (ASA) may help to explain this ability, some perceptual experiments show fusion of the speech signal under circumstances in which ASA principles might be expected to cause segregation. We propose a model of multi-resolution ASA that uses both highand lowresolution representations of the auditory signal in parallel in order to resolve this conflict. The use of parallel representations reduces variability for pattern-matching while retaining the ability to identify and segregate low-level features of the signal. An important feature of the model is the assumption that features of the auditory signal are fused together unless there is good reason to segregate them. Speech is recognised by matching the low-resolution representation to previously learned speech templates without prior segregation of the signal into separate perceptual streams; this contrasts with the appr oach gener a l l y used by comput at i onal m odels of A S A . We describe an implementation of the multi-resolution model, using hidden Markov models, that illustrates the feasibility of this approach and achieves much higher identification performance than standard techniques used for computer recognition of speech mixed with other sounds.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...

متن کامل

A case for multi-resolution auditory scene analysis

A commonly held view of auditory scene analysis is that complex auditory environments are segregated into separate perceptual streams using primitive cues that can be attended to separately. We argue that this view is inconsistent with the majority of perceptual data reported in the literature and propose an alternative model that is based on a primary, low resolution signal representation used...

متن کامل

Missing data techniques for robust speech recognition

In noisy listening conditions, the information available on which to base speech recognition decisions is necessarily incomplete: some spectro-temporal regions are dominated by other sources. We report on the application of a variety of techniques for missing data in speech recognition. These techniques may be based on marginal distributions or on reconstruction of missing parts of the spectrum...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Multi-resolution auditory scene analysis: robust speech recognition using pattern-matching from a noisy signal

نویسندگان

چکیده

منابع مشابه

Improving the performance of MFCC for Persian robust speech recognition

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

A case for multi-resolution auditory scene analysis

Missing data techniques for robust speech recognition

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

عنوان ژورنال:

اشتراک گذاری